An Adaptive and Efficient Dimensionality Reduction Algorithm for High-Dimensional Indexing

نویسندگان

  • Hui Jin
  • Beng Chin Ooi
  • Heng Tao Shen
  • Cui Yu
  • Aoying Zhou
چکیده

The notorious “dimensionality curse” is a well-known phenomenon for any multi-dimensional indexes attempting to scale up to high dimensions. One well known approach to overcoming degradation in performance with respect to increasing dimensions is to reduce the dimensionality of the original dataset before constructing the index. However, identifying the correlation among the dimensions and effectively reducing them is a challenging task. In this paper, we present an adaptive Multi-level Mahalanobisbased Dimensionality Reduction (MMDR) technique for high-dimensional indexing. Our MMDR technique has three notable features compared to existing methods. First, it discovers elliptical clusters using only the low-dimensional subspaces. Second, data points in the different axis systems are indexed using a single B -tree. Third, our technique is highly scalable in terms of data size and dimensionality. An extensive performance study using both real and synthetic datasets was conducted, and the results show that our technique not only achieves higher precision, but also enables queries to be processed efficiently.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An efficient high-dimensional indexing method for content-based retrieval in large image databases

High-dimensional indexing methods have been proved quite useful for response time improvement. Based on Euclidian distance, many of them have been proposed for applications where data vectors are high-dimensional. However, these methods do not generally support efficiently similarity search when dealing with heterogeneous data vectors. In this paper, we propose a high-dimensional indexing metho...

متن کامل

Efficient Similarity Indexing and Searching in High Dimensions

Efficient indexing and searching of high dimensional data has been an area of active research due to the growing exploitation of high dimensional data and the vulnerability of traditional search methods to the " curse of dimensionality ". This paper presents a new approach for fast and effective searching and indexing of high dimensional features using random partitions of the feature space. Ex...

متن کامل

Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

Many emerging application domains require database systems to support efficient access over highly multidimensional datasets. The current state-of-the-art technique to indexing high dimensional data is to first reduce the dimensionality of the data using Principal Component Analysis and then indexing the reduceddimensionality space using a multidimensional index structure. The above technique, ...

متن کامل

Indexing Reduced Dimensionality Spaces Using Single DimensionalIndexesHeng

The dimensionality curse has greatly aaected the scalability of high-dimensional indexes. A well known approach to improving the indexing performance is dimensionality reduction before indexing the data in the reduced-dimensionality space. However, the reduction may cause loss of distance information when the data set is not globally correlated. To reduce loss of information and degradation of ...

متن کامل

Concept Indexing A Fast Dimensionality Reduction Algorithm with Applications to Document Retrieval & Categorization

In recent years, we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased interest in developing methods that can efficiently categorize and retrieve relevant information. Retrieval techniques based on dimensionality reduction, such as Latent Semantic Indexing (LSI), have...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003